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Abstract 

The 3C-like proteinase of severe acute respiratory syndrome coronavirus (SARS) has been proposed to be a key target for struc¬ 
tural based drug design against SARS. We have designed and synthesized 34 peptide substrates and determined their hydrolysis 
activities. The conserved core sequence of the native cleavage site is optimized for high hydrolysis activity. Residues at position 
P4, P3, and P3' are critical for substrate recognition and binding, and increment of (3-sheet conformation tendency is also helpful. 
A comparative molecular field analysis (CoMFA) model was constructed. Based on the mutation data and CoMFA model, a mul¬ 
tiply mutated octapeptide S24 was designed for higher activity. The experimentally determined hydrolysis activity of S24 is the high¬ 
est in all designed substrates and is close to that predicted by CoMFA. These results offer helpful information for the research on the 
mechanism of substrate recognition of coronavirus 3C-like proteinase. 

© 2005 Elsevier Inc. All rights reserved. 
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Human coronaviruses are major causes of upper 
respiratory tract illness in humans. A novel form of 
coronavirus has been identified as the major cause of se¬ 
vere acute respiratory syndrome (SARS), a disease that 
was rapidly spreading from southern China to several 
countries in 2003 [1,2]. Coronaviruses are members 
of positive-stranded RNA viruses featuring the largest 
viral RNA genomes up to date. The SARS coronavirus 
replicase gene encompasses two overlapping transla¬ 
tion products, polyprotein la (~450 kDa) and lab 
(~750 kDa), which are conserved both in length and 
amino acid sequence to other coronavirus replicase pro¬ 
teins. Polyprotein la and lab are cleaved by the inter¬ 
nally encoded 3C-like proteinase to release functional 
proteins necessary for virus replication. The SARS 3C- 
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like proteinase is fully conserved among all released 
SARS coronavirus genome sequences and is highly 
homologous with other coronavirus 3C-like proteinases. 
Due to the functional importance of SARS 3C-like pro¬ 
teinase in the viral life cycle, it has been proposed to be a 
key target for structural based drug design against 
SARS [3]. 

The crystal structures of the 3C-like proteinases of 
human coronaviruses (HCoV), transmissible gastroen¬ 
teritis virus (TGEY), and SARS-associated coronavirus 
(SARS-CoV) have been solved, all of which are dimeric 
[3-5]. The structure of coronavirus 3C-like proteinase 
contains three domains, the first two domains form a 
chymotrypsin fold, which is responsible for the catalytic 
reaction, and the extra helical domain of the enzyme 
plays an important role in controlling the association- 
dissociation equilibrium and regulating the activity 
and specificity of the enzyme [6]. The crystal structures 
of TGEV and SARS-CoV 3C-like proteinases in com¬ 
plex with a hexapeptidyl chloromethyl ketone inhibitor 
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have also been solved [3,4], which afford the structural 
insights into the substrate specificity of coronavirus 
3C-like proteinase. As shown in the crystal structure 
of the TGEV 3C-like proteinase—inhibitor complex, 
the well-defined SI, S2, and S4 specificity pockets indi¬ 
cate residues at PI, P2, and P4 positions in the substrate 
are critical for substrate binding [3]. 

The substrate specificity of coronavirus 3C-like pro¬ 
teinase is determined mainly by the PI, P2, and PI' posi¬ 
tions [7]. The PI position has a well-conserved Gin residue 
and P2 a hydrophobic one. Unlike other previously iden¬ 
tified coronavirus 3C-like proteinases, which have Leu/Ile 
at position P2, SARS 3C-like proteinase also tolerates 
Phe, Val, and Met residues at P2 position. PI' position re¬ 
quires a small aliphatic residue, usually Ser or Ala. Con¬ 
served substrate selectivity among different coronavirus 
3C-like proteinases is determined using an HPLC-based 
peptide cleavage assay by Hegyi and Ziebuhr [7]. This re¬ 
sult indicates the possibility of both the development of 
universally applicable 3C-like proteinase assay and the 
design of broad-spectrum inhibitors blocking all corona¬ 
virus 3C-like proteinases. Coronavirus 3C-like protein¬ 
ases share the chymotrypsin fold part and similar 
substrate specificities with the 3C proteinases from other 
viruses like rhinovirus, picornavirus [3,8,9]. 

In order to study the substrate specificity of SARS 3C- 
like proteinase, we have cloned, expressed, and purified 
the protein, and studied its activity towards 11 peptides 
covering the 11 cleavage sites on the virus polyproteins. 
Our results confirm that purified SARS 3C-like protein¬ 
ase is active towards the substrate peptides mapped from 
the cleavage sites on the polyprotein and reveals the rela¬ 
tionship between the reaction activity and the secondary 
structure content of the substrates [10]. In the current 
study, we have studied the substrate requirements for 
SARS 3C-like proteinase binding using 34 truncated 
and mutated substrate peptides. This study helps one to 
understand the mechanism of substrate recognition and 
catalysis of coronavirus 3C-like proteinases. 


Materials and methods 

Peptides synthesis. The natural substrate peptide S12 was designed 
based on the N-terminal self-cleavage site of SARS 3C-like proteinase, 
with the sequence of: Ser-Ala-Val-Leu-Gln-Ser-Gly-Phe-CONH 2 . 
Other substrates were designed by truncating the N- and C-terminals, 
or mutation of SI2. All the designed substrate peptides were synthe¬ 
sized by solid-phase peptide synthesis using standard 9-fluorenyl- 
methoxycarbonyl/W/-butyl strategy. Cleavage of the peptide from 
Rink resin and removal of all sidechain-protecting groups were 
achieved in trifluoroacetic acid solution. The crude peptide was puri¬ 
fied by re versed-phase high performance liquid chromatography (RP- 
HPLC, Elite P200II, Dalian, China) on a Zorbax Cl8 semi-preparative 
column (9.4 by 250 mm, Agilent) with gradients of water/acetonitrile 
containing 0.1% trifluoroacetic acid. Peptide homogeneity and identity 
were analyzed by analytical RP-HPLC, and matrix-assisted laser 
desorption/ionization time-of-flight mass spectroscopy, respectively. 


Peptide cleavage. The reaction activities of the designed substrate 
peptides were determined using HPLC-based peptide cleavage assay. 
The C-terminal His-tagged SARS 3C-like proteinase was expressed 
and purified as previously described [10], which was shown to have 
comparable enzyme activities with the non-His-tagged protein [11]. 
Cleavage assay solution was incubated at room temperature, which 
contains 0.2-0.4 mM substrate peptide, 5.41, 7.41 or 27.05 pM en¬ 
zyme, and 57 pM DTT in 40 mM Tris-HCl buffer, pH 7.3. Aliquots of 
reactions were removed every 10 or 60 min within 1-7 h, stopped by 
the addition of 0.1% trifluoroacetic acid aqueous solution, and ana¬ 
lyzed by HPLC (LabPrep System, Gilson) on a Zorbax Cl8 analytic 
column (4.6 by 250 mm, Agilent). Cleavage products were resolved 
using a 15-min, 10-60% linear gradient of acetonitrile in 0.1% triflu¬ 
oroacetic acid, as described previously [10]. k cat /K m was determined by 
plotting substrate peak area using the equation below: 

In PA = C - (k cat /K m ) app c E t, (1) 

where PA is the peak area of the substrate peptide, c E is the total con¬ 
centration of His-tagged 3C-like proteinase, and C is an experimental 
constant and averaged for two independent measurements. 

Results 

Substrate design and peptide cleavage assay 

Previous studies have indicated that coronavirus 3C- 
like proteinases share a highly conserved substrate core 
sequence. We have studied the activities of SARS 3C- 
like proteinase towards 11 peptides covering the 11 
cleavage sites on the SARS coronavirus polyproteins 
la and lab [10]. However, the 11 native cleavage se¬ 
quences are not enough to offer detailed information 
about the residue specificity at each substrate position. 
Here we have designed 34 mutated and truncated pep¬ 
tides based on the N-terminal self-cleavage site of SARS 
3C-like proteinase, among which 28 are octapeptides 
and the other 6 are shorter peptides. The native sub¬ 
strate S12 has the sequence of: Ser-Ala-Val-Leu-Gln- 
Ser-Gly-Phe-CONH 2 . Different mutations are designed 
at position P5 to PI'. 

The dimeric form of the SARS 3C-like proteinase has 
been proved to be the major form for biological activity 
by various researchers [6,10,12]. The apparent k cixt /K m is 
highly enzyme concentration-dependent. In this study, 
the proteolytic activities of the substrates are determined 
at constant proteinase concentrations (5.41 pM for most 
octapeptides, 7.41 and 27.05 pM for truncated and P2 
mutated substrates due to their low activities), using 
an HPLC-based peptide cleavage assay. The results are 
listed in Table 1, and only relative activities are used 
in further analysis (see Fig. 1). 

The substrate core sequence (position P2 to PI') of 
SARS 3C-like proteinase is highly conserved 

The coronavirus 3C-like proteinase recognizes a 
highly conserved core sequence (position P2 to PI') of 
Leu-Gln-(Ser/Ala) [3-5,7,13,14]. In this study, residues 
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Table 1 


Enzyme cleavage activities of mutated substrate peptides 


Substrate 

Sequence 3 

Apparent k cat /K m (mM 1 min l ) b 

(^cat/A m )rcl 

S12 

SAVLQ 1 SGF-CONH 2 

2.13 ±0.20 

1.00 

P5F 

LAVLQ 1 SGF-CONH 2 

8.32 ±0.25 

3.90 

P5T 

1AVLQ 1 SGF-CONH 2 

7.80 ±0.24 

3.66 

P5V 

VAVLQ 1 SGF-CONFL 

7.66 ±0.30 

3.59 

P5A 

AAVLQ 1 SGF-CONFL 

7.10 ±0.24 

3.33 

P4F 

SLVLQ 1 SGF-CONH 2 

0.33 ± 0.04 

0.15 

P4T 

STVLQ 4- SGF-CONH 2 

3.14 ± 0.12 

1.47 

P4V 

SVVLQ 1 SGF-CONH 2 

5.21 ± 1.27 

2.44 

P3F 

SALLQ i SGF-CONH 2 

1.85 ±0.06 

0.87 

P3T 

SATLQ 1 SGF-CONH 2 

2.53 ±0.54 

1.19 

P3A 

SAALQ 1 SGF-CONFL 

0.40 ± 0.03 

0.19 

P3K 

SAKLQ 4- SGF-CONFF 2 

5.72 ± 0.45 

2.68 

Pl'A 

SAVLQ 1 AGF-CONFL 

4.36 ±0.19 

2.04 

Pl'G 

SAVLQ 1 GGF-CONFL 

1.76 ±0.09 

0.83 

Pl'L 

SAVLQ 1 LGF-CONFi 2 

ND C 


S12 

SAVLQ 1 SGF-CONFF 2 

12.68 ± 0.60* 

1.000 

P2M 

SAVMQ i SGF-CONFL 

2.54 ±0.07* 

0.208 

P2F 

SAVFQ 1 SGF-CONH 2 

0.58 ±0.10* 

0.046 

P2I 

SAVIQ 1 SGF-CONFL 

0.080 ±0.011* 

0.0063 

P2V 

SAVVQ 1 SGF-CONFL 

0.070 ± 0.009* 

0.0056 

P2A 

SAVAQ 1 SGF-CONFi 2 

0.058 ± 0.008* 

0.0046 

P2R 

SAVRQ 1 SGF-CONFi 2 

ND C ’* 


PIN 

SAVLN 1 SGF-CONFF 2 

ND C ’* 


PIE 

SAVLE 1 SGF-CONFi 2 

ND C ’* 


P1K 

SAVLK 1 SGF-CONFi 2 

ND C ’* 


S21 

TVVLQ 1 SGF-CONFL 

8.48 ± 1.32 

3.98 

S22 

TVTLO 4- SGF-CONFL 

5.58 ±0.21 

2.62 

S23 

VVTLO 1 SGF-CONFL 

5.18 ±0.74 

2.43 

S24 

TVKLQ 1 AGF-CONFF 2 

9.18 ±0.25 

4.31 


a Mutated residues are underlined. Cleavage sites are indicated by i. 

b The concentration of SARS 3C-like proteinase is 5.41 pM for substrates without any label and that for substrates labeled with asterisk is 
27.05 |iM. 

c Not detectable in HPLC-based peptide cleavage assay. 



Fig. 1. The superimposed 22 substrate structures and the contour plot 
of the CoMFA model. This result indicates that increasing positive 
charge at position P3 is favored (blue), and large hydrophobic residue 
at position P2 is favored (green), which is compatible with the crystal 
structure. (For interpretation of the references to color in this figure 
legend, the reader is referred to the web version of this paper.) 

at position P2, PI, and PI' were mutated to amino acids 
with similar or different properties. The activities of mu¬ 
tated substrates have indicated that the conserved core 


sequence reveals high hydrolytic activity. Mutation at 
these positions will decrease the substrate activity signif¬ 
icantly, or completely abolish its activity, such as P2R, 
Pl'L, and PI mutants (see Table 1). 

PI' position mutants with small residues Ser, Ala, 
and Gly, which are commonly found in this position 
in 3C-like proteinase cleavage sites, have comparable 
substrate activity. When it is replaced by a large Leu, 
no substrate peptide is cleaved as observed by RP- 
HPLC. This confirms that small aliphatic residues are 
favored at PI' position. 

The single mutation of Gln-Pl to Glu or Asn abol¬ 
ishes the substrate activity completely, which is compat¬ 
ible with the absolute specificity for the Gln-Pl substrate 
residue. This result strongly suggests that the Gln-Pl is 
critical for substrate binding and cleavage. 

P2 position requires a large hydrophobic residue as 
shown in the native cleavage sites. When Leu-P2 is re¬ 
placed by Arg, a positive-charged large residue, no 
cleavage is observed in HPLC-based assay. These re¬ 
sults confirm the hydrophobic interaction between 
the P2 residue and the S2 pocket of the enzyme indi¬ 
cated by crystal structure [3,5]. Mutations to small res¬ 
idues such as Ala also decrease the relative reaction 
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activity of the substrate significantly. Although Phe, 
Val, and Met are tolerant at P2 for SARS 3C-like 
proteinase [10], the mutation of Leu-P2 to other 
hydrophobic residues decreases the substrate reaction 
activity by a factor of 5-200 (see Table 1). The rela¬ 
tive reaction activity of mutated substrates was signif¬ 
icantly decreased while the P2 residue is replaced by 
p-branched lie and Val. This result indicates that the 
steric hindrance effect is important in the residual 
specificity at P2 position. 

The residual selectivity of the substrate at position P5, P4, 
and P3 

Our previous research has shown that the (3-strand- 
like conformation of the substrate is favored for sub¬ 
strate binding and cleavage [10]. In this study, we have 
further studied the residue specificity and its relationship 
with the residue secondary structure tendency via sub¬ 
strate mutation. Residues at position P5-P3 are mutated 
to Ala, Leu, Val, and Thr, which differ in their polarities 
and secondary structure tendencies. The relative activi¬ 
ties of the mutants are listed in Table 1. 

It is interesting that all the four mutants at position 
P5 have significant increment in their activities by a fac¬ 
tor of 3.33-3.90. This indicates that Ser is not the best 
residue for position P5. Position P5 shows no specificity 
among residues that have different secondary structure 
tendency, but the decrement of residue polarity seems 
helpful for the substrate activity. 

When Ala-P4 is replaced by residues with higher 
P-sheet tendency, such as Val and Thr, the substrate 
activity increases obviously; on the other hand, similar 
mutation at position P3 only causes little increment in 
substrate activity. It is notable that the native substrate 
S12 has an oe-helix-preferred residue Ala at position P4 
and a P-sheet-preferred residue Val at P3. Single muta¬ 
tion of Ala-P4 to Val or Thr will increase the p-sheet 
tendency of the substrate significantly due to the tandem 
p-sheet-preferred residues at position P4 and Val at P3; 
however, single mutation of Val-P3 to other residues 
does not increase the p-sheet-preference of the substrate 
due to no such cooperation being found with the a-he- 
lix-preferred Ala-P4. The possibility of conformational 
cooperativity with its neighbors is in agreement with 
the activity increment, which explains the different activ¬ 
ities of mutants at position P4 and P3. This is supported 
by the multiply mutated substrates S21-S23, which are 
designed for increasing the p-sheet-preference of the 
substrate by tandem-placed P-sheet-preferred residues 
and have highest activities among the tested substrates 
in Table 1. 

It is notable that the mutation of Val-P3 to Lys (sub¬ 
strate P3K) causes 2.68-fold increment in its hydrolysis 
activity. As shown in the crystal structure, the side chain 
of Glu-166 of the enzyme is exposed to solvent and is 


near the edge of the specific pocket S3. It is possible that 
the formation of the salt bridge between the side chain 
of Lys-P3 and that of Glu-166 of the enzyme is helpful 
for substrate binding. This result suggests that an addi¬ 
tional positive charge at position P3 will be propitious in 
the substrate or inhibitor design. 

The relative reaction activities of truncated substrates 
indicate position P4, P3, and P3' are important for 
substrate recognition 

The contribution of each residue to the substrate 
activity is further investigated using truncated substrate 
based on the native substrate peptide SI2. The activi¬ 
ties of the native octapeptide and 6 truncated sub¬ 
strates are determined by HPLC peptide cleavage 
assay and are listed in Table 2. 

As shown in Table 2, the activity of the substrate de¬ 
creases after truncation. Longer substrates, which have 
more residues binding to the proteinase specific pockets, 
are favored in enzyme binding and hydrolysis. However, 
the contributions of residues at different positions are 
significantly different. Deletion of Ala-P4 and Val-P3 
causes significant decrease in the activity of the substrate 
by a factor of 11.4 and 7.08, respectively, while the activ¬ 
ity of substrate S17 (deleting Ser-P5) remains 76.3% 
compared with that of native octapeptide S12. This re¬ 
sult suggests that the residues at position P4 and P3 
are important for substrate recognition and binding. 
On the other hand, the residues at C-terminal of the sub¬ 
strate seem to be less important. As shown in Table 2, 
deletion of Phe-P3' and Gly-P2' decreases the substrate 
activity by a factor of 5.03 and 1.15, respectively. Phe- 
P3' has large contribution to substrate hydrolysis, which 
may arise from the potential hydrophobic interaction 
between the aromatic side chain and the enzyme. 

Comparative molecular field analysis and application in 
predicting new substrate 

The quantitated lg(^ C at/^m) of 22 substrates was 
used for comparative molecular field analysis 

Table 2 


Enzyme cleavage activities of truncated substrate peptides 


Substrate 

Sequence a 

Apparent k ca J 

K m (mM -1 min 1 ) b 

( Aat/An ) rel 

S12 

SAVLQ i SGF-CONH, 

7.85 ±0.30 

1.00 

S13 

AVLQ 1 SG-CONH, 

1.19 ±0.03 

0.152 

S14 

VLQ 1 SG-CONH 2 

0.148 ±0.006 

0.0189 

S15 

AVLQ 1 S-CONH 2 

1.01 ±0.02 

0.129 

S16 

VLQ 1 S-CONH 2 

0.131 ±0.006 

0.0167 

S17 

AVLQ 1 SGF-CONH 2 

5.99 ± 0.08 

0.763 

S18 

LQ 1 SG-CONH, 

0.013 ±0.004 

0.0017 


a Cleavage sites are indicated by i. 

b The concentration of SARS 3C-like proteinase is 7.41 pM. 
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(CoMFA) [15]. The complex structures of SARS SC- 
like proteinase and substrates were generated based 
on the crystal structure of the enzyme (PDB code: 
1UK4) using Sybyl 6.91, followed by energy minimiza¬ 
tion using 500 steps of steepest descents method. The 
structures of the 22 substrates were superimposed 
based on their backbone atoms for CoMFA. The best 
orientation was searched by all-orientation search 
strategy using AOS/APS program [16], to minimize 
the effect of the initial orientation of the substrates. 
CoMFA was performed using the QSAR module in 
Sybyl 6.91. The standard grid spacing of 0.2 nm was 
chosen. The steric and electrostatic field energies were 
calculated using an sp carbon probe atom with +1 
charge. The steric and electrostatic energy cutoffs were 
set to 125.4 kJ/mol. 

The partial linear square analysis, with a number of 
components of 5 and a column filtering of 8.36 kJ/ 
mol, resulted a leave-one-out cross-validated q of 
0.661 and a number of optimum components of 2. The 
full data partial linear square analysis using the opti¬ 
mum component of two resulted in conventional corre- 
late coefficient r = 0.822, standard error of estimation 
s = 0.429, and F 2 ,20 = 46.213. The relative contributions 
of steric and electrostatic fields in the QSAR equation 
are 81.8% and 18.2%, respectively. This result suggests 
that steric interaction is the major force for substrate 
recognition. 

Designing highly active substrate 

A new substrate octapeptide was designed by com¬ 
bining all hydrolysis-preferred factors, including tan¬ 
dem-placed (3-sheet-preferred residues at position P5 
and P4, a positive-charged Lys at P3, and a Ala at PF 
(see Table 1). The substrate was synthesized and its 
activity was determined by HPLC. The substrate S24 
shows the highest hydrolysis activity among all 34 de¬ 
signed substrates, which was 4.31 times compared with 
the native sequence SI2. 

This substrate was used to test the predicting ability 
of the CoMFA model. The lg(/c ca t/^m) of S24 predicted 
by our CoMFA model was 0.58, which is close to the 
experiment result of 0.634. This result strongly suggests 
that the CoMFA model can be used to predict the activ¬ 
ity of newly designed peptide substrates. It also confirms 
all previous discoveries derived from the substrate activ¬ 
ities directly, especially the relationship between the 
hydrolysis activity and the substrate secondary structure 
preference. 

Discussion 

Previous researches on the coronaviruses 3C-like pro- 
teinases via sequence analysis have suggested a highly 


conserved core sequence (position P2 to PF) of Leu- 
Gln-(Ser/Ala) [3-5,7,13,14]. All the 11 cleavage sites in 
SARS coronavirus have a conserved Gin at position 
PI; 8 of them have a Leu at P2; and 9 have Ala or Ser 
at PF. Similar result has been reported in the sequence 
analysis of 77 cleavage sites of 3C-like proteinases in dif¬ 
ferent coronaviruses [13]. In this study, mutation at 
these positions decreased the substrate activity signifi¬ 
cantly or completely abolished its activity. This indicates 
that the conserved core sequence reveals high hydrolytic 
activity. Both the highly conserved substrate sequence 
and the specific enzyme might be the result of a cooper¬ 
ated evolutional process. 

The crystal structures of TGEV and SARS coronavi¬ 
rus 3C-like proteinases in complex with a hexapeptidyl 
chloromethyl ketone inhibitor [3,4] afford the structural 
insights into the substrate specificity of coronavirus 3C- 
like proteinase. As shown in the crystal structure, the 
hydrogen bond formed between the side-chain amide 
group of Gln-Pl and the imidazole of a conserved His 
in the enzyme is critical for substrate recognition and 
binding [3,4]. Mutation of Gln-Pl to Glu or Asn is tol¬ 
erant in 3C-like proteinase in other viruses, although the 
substrate activity may be decreased [17,18]. However, 
our result and the highly conserved Gln-Pl in native 
cleavage sites indicate that the residue specificity at posi¬ 
tion PI is more critical for coronavirus 3C-like 
proteinases. 

P2 position requires a large hydrophobic residue, 
especially Leu, as shown in the native cleavage sites. 
Although Phe, Val, and Met are tolerant at P2 for SARS 
3C-like proteinase [10], the mutation of Leu-P2 to other 
hydrophobic residues decreases the substrate reaction 
activity dramatically (see Table 1). It is notable that 
the relative reaction activity of mutated substrates de¬ 
creases from 0.208, 0.046 to about 0.006 while the P2 
residue is replaced by no-branched Met, p-phenyl-con- 
taining Phe or P-branched lie and Yal. This result indi¬ 
cates that the steric hindrance effect may be critical at 
this position. 

The crystal structure of TGEV 3C-like proteinase 
indicates that the substrate binds in the shallow sub¬ 
strate-binding site at the surface of the enzyme, between 
the domain I and II in the chymotrypsin fold. An anti¬ 
parallel sheet is formed by the substrate peptide (P5- 
P3), the strand ell (164-167), and loop (189-191) of 
the enzyme, and two hydrogen bonds are formed be¬ 
tween the backbone amides of Ala-P4, Val-P3, and the 
backbone carbonyls of Glu-165, Ser-189 of the protein¬ 
ase [3]. Our results about the truncated substrates sug¬ 
gest the two hydrogen bonds are critical for substrate 
binding. On the other hand, the C-terminal residues 
seem to be less important as N-terminal residues. This 
is consistent with previous study on human rhinovirus 
3C protease by Cordingley et al. [19]. Phe-P3' has large 
contribution to substrate hydrolysis, which may arise 
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from the potential hydrophobic interaction between the 
aromatic side chain and the enzyme, and was confirmed 
in the virtual screen of SARS 3C-like proteinase inhibi¬ 
tors [20]. 

The crystal structure also indicates that increased 
(3-sheet tendency of the substrate will be helpful for 
substrate binding, which has been revealed by our pre¬ 
vious work [10]. The native substrate S12 has an oe-he- 
lix-preferred Ala at position P4 and a (3-sheet-preferred 
Val at position P3. The possibility of conformational 
cooperation with its neighbors will explain the different 
activities of mutants at position P4 and P3. Single 
mutation of Ala-P4 to Val or Thr will increase the 13- 
sheet tendency of the substrate significantly due to 
the tandem (3-sheet-preferred residues at position P4 
and Val at P3; however, single mutation of Val-P3 to 
other residues does not increase the (3-sheet-preference 
of the substrate due to no such cooperation being 
found with the oe-helix-preferred Ala-P4. These results 
suggest that the hydrolysis activity may increase further 
by an additional (3-sheet-preferred residue at position 
P5. 

Substrates S21-S23 are designed with continued 
(3-sheet-preferred residues at position P5-P3. Although 
no secondary structure analysis is done for these sub¬ 
strates due to their low solubility, their high tendency 
to aggregation in aqueous solution implies their high 
(3-sheet content. This offers further evidence for the rela¬ 
tionship between the secondary structure tendency and 
the activity of the substrate of coronavirus 3C-like 
proteinase. 

A CoMFA model was generated using the relative 
activities of 22 octapeptide substrates. The predicting 
ability was tested with the multiply mutated substrate 
S24, which has the highest activity among all the tested 
substrates. The \g(k cat /K m ) of S24 predicted by our 
CoMFA model is close to the experiment. This result 
strongly suggests that the CoMFA model is useful for 
predicting the activity of newly designed peptide sub¬ 
strates for SARS 3C-like proteinase. 

In summary, we have designed and synthesized 34 
peptide substrates for SARS 3C-like proteinase, and 
determined their hydrolysis activities by HPLC-based 
peptide cleavage assay. A CoMFA model was generated 
and its prediction ability is proved by a newly designed 
peptide substrate S24. The core sequence of the protein¬ 
ase cleavage site is highly conserved and optimized for 
enzyme recognition and catalysis. Residues at position 
P4, P3, and P3' are critical for substrate recognition 
and binding, and increment of (3-sheet conformation 
tendency at position P4 and P3 is helpful for substrate 
binding and hydrolysis. In addition, a salt bridge be¬ 
tween position P3 and Glu-166 of the enzyme will in¬ 
crease the activity of the substrate. These results offer 
helpful information for understanding the mechanism 
of substrate recognition and catalysis of coronavirus 


3C-like proteinase and the rational drug design against 
coronavirus. 
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